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Abstract 

In this study, an artificial neural network (ANN) based on par¬ 
ticle swarm optimization (PSO) was developed for the time se¬ 
ries prediction. The hybrid ANN-t-PSO algorithm was applied 
on Mackey-Glass chaotic time series in the short-term x{t + 6). 
The performance prediction was evaluated and compared with 
another studies available in the literature. Also, we presented 
properties of the dynamical system via the study of chaotic be¬ 
haviour obtained from the predicted time series. Next, the hy¬ 
brid ANN-l-PSO algorithm was complemented with a Gaussian 
stochastic procedure (called stochastic hybrid ANN-t-PSO) in 
order to obtain a new estimator of the predictions, which also 
allowed us to compute uncertainties of predictions for noisy 
Mackey-Glass chaotic time series. Thus, we studied the impact 
of noise for several cases with a white noise level from 0.01 
to 0.1. 

1 Introduction 

Currently, the prediction of time series has played an important 
role in many science fields of practical application as engineer¬ 
ing, biology, physics, meteorology, etc. In particular, and due to 
their dynamical properties, the analysis and prediction of chaotic 
time series have been of interest for the science community. In 
general, the chaotic time series are usually modeled by delay- 
differential equations; standard examples are the Mackey-Glass 
system m, or the Ikeda equation m (for more examples see 
O). Also, many methods have been used in the chaotic time se¬ 
ries analysis H However, in the last decades different types of 
artificial neural networks (ANN) have been widely used for fore¬ 
casting of chaotic time series, for example, back-propagation al¬ 
gorithm ( 5 ), radial basic function (6), recurrent network (7), etc. 

On the other hand, the analysis of real-life time series requires 
of taking into account the error propagation of input uncertain¬ 
ties. The observed data could be contaminated for different in¬ 


strumental noise types as white noise or proportional to signal 
(the latter mainly arises from instrumental calibration). In mod¬ 
eling of chaotic time series, the impact of noise can be treated as 
errors-invariable problem where the noise is propagated into the 
prediction model. In the literature, the noisy impact on chaotic 
time series prediction has been barely considered. We can found 
studies where the algorithms were tested from a theoretical point 
of view (for example, see El in [Hill] [HI), and works where 
the implementation was applied on real-life time series (for ex¬ 
ample, see GHimisi)- In addition, some authors have proposed 
a modification to the standard methods in order to improve the 
performance prediction in presence of noise niiu. 

In this work, we used the Mackey-Glass chaotic time series 
in order to study the short-term prediction {x{t + 6)) with an arti¬ 
ficial neural network optimized with a particle swarm algorithm 
(ANN-t-PSO). The method was applied on noiseless and noisy 
chaotic time series. In order to carry out the error propagation of 
the input noise, this hybrid algorithm was complemented with a 
Gaussian stochastic procedure to compute a new estimator of the 
predictions and their uncertainties. Note that ANNs have been 
used in combination with PSO in several applications. Princi¬ 
pally, these applications include: feedforward neural network 
training (l5l[T6l|I71[Tl, design of recurrent neural networks 
(m, design of radial basis function networks J2Q1 , and neural 
network control for nonlinear processes 1211 . In addition, there 
are several current versions of PSO available in the literature 
(for example, see the following reviews (221HH1211), but our 
application uses a standard PSO with inertial weight (251 . In 
this point, the use of a PSO with inertial weight is based on the 
following reasons: 1) this vesion of PSO is easy to understand 
and implement due to its simple concept and learning strategy; 
2) as pointed out in 1261 . the PSO with inertia weight EH and 
PSO with constriction factor EH are mathematically equivalent, 
and PSO with constriction factor can be considered as a special 
case of PSO with inertia weight ESEiD (note that this equiv¬ 
alence can be applied to other improved PSO algorithms that 
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include a varying the inertia weight schedule); 3) inertia weight 
PSO algorithm is quite stable to population changes (23); 4) the 
advantages and disadvantages of variants of PSO depend on the 
problem to solve (22] [HIED; 5) as a first approach for study 
of noise effect on dynamical systems using an ANN combined 
with inertia weight PSO algorithm, so the present study may 
motivate and help the researchers working in the field of evo¬ 
lutionary algorithms to develop new hybrid models or to apply 
other existing PSO models to solve this problem. To the best of 
the authors’s knowledge, there is no application for forecasting 
of noisy chaotic time series such as the one presented here, using 
a hybrid method that combined ANN with PSO algorithm. 

Organization of this paper is as follows. In Section El we 
present a detailed description of the hybrid ANN+PSO method. 
SectionEjand Section 4 present the simulation, algorithm imple¬ 
mentation and the principal results obtained for the forecasting 
of noiseless chaotic time series and noisy time series, respec¬ 
tively. Finally, conclusions are given in Section 5. 

2 Hybrid ANN+PSO algorithm 

Artificial neural networks (ANN) are similar to biological neu¬ 
ral networks in performing functions collectively and in parallel 
using connection nodes. Thus, ANN are a family of statistical 
learning algorithms biologically inspired. 

In this study, we consider one of the most successful and 
frequently used types of neural networks: a multilayer feed¬ 
forward neural network with a backpropagation learning algo¬ 
rithm (gradient descent error). This ANN was implemented re¬ 
placing standard backpropagation with particle swarm optimiza¬ 
tion (PSO). 

PSO is a population-based optimization tool, where the sys¬ 
tem is initialized with a population of random particles and the 
algorithm searches for optima by updating generations 1281 . In 
each iteration, the velocity of each particle j is calculated ac¬ 
cording to the following formula (29); 

vf' = wv) + c,r, + C2r2 - i*) (D 

where s and v denote a particle position and its corresponding 
velocity in a search space, respectively, k is the current step 
number, tu is the inertia weight, ci and C 2 are the acceleration 
constants, and rj, r 2 are elements from two random sequences 
in the range (0,1 ). j* is the current position of the particle, is 
the best one of the solutions that this particle has reached, and 
ij/g is the best solutions that all the particles have reached. In 
general, the value of each component in v can be clamped to 
the range [-v^axi +Vmax] control excessive roaming of particles 
outside the search space I28II29I . After calculating the velocity, 
the new position of each particle is: 


.?*+> = -H vf' (2) 

The procedure to calculating the output values, using the input 
values are described in detail in 1301 . 

The net inputs (N) are calculated for the hidden neurons com¬ 
ing from the inputs neurons. In the case of a neuron in the hidden 
layer is has: 


N 


=z 






(3) 


where p,- is the vector of the inputs of the training, w'lj is the 
weight of the connection among the input neurons with the hid¬ 
den layer h, and the term b'jj corresponds to the bias of the neu¬ 
ron of the hidden layer h, reached in its activation. The PSO 
algorithm is very different then any of the traditional methods 
of training 1281 . Each neuron contains a position and velocity. 
The position corresponds to the weight of a neuron (i* —> 

The velocity is used to update the weight ^ '^o)- Starting 
from these inputs, the outputs (y,) of the hidden neurons are cal¬ 
culated, using a transfer function f' associated with the neurons 
of this layer: 


.V, = /'■ 


z 




+ b'! 


(4) 
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The transfer functions /* can be linear or non-linear. We used 
one hidden layer with ff' as a tangent hyperbolic function (tans¬ 
ing) and fj as a linear function in the output layer. 


m) = 


e"' -I- e- 


(5) 


All the neurons of the ANN have an associated activation value 
for a give input pattern, and the algorithm continues finding the 
error that is presented for each neuron, except those of the input 
layer. After finding the output values, the weights of all layers of 
the network are actualized Wjj —> by PSO, using eqs. (Tjand 

E)(22l- The velocity is used to control how much the position 
is updated. On each step, PSO compares each weight using the 
data set. The network with the highest fitness is considered the 
global best. The other weights are updated based on the global 
best network rather than on their personal error or fitness 1281 . In 
this article, we used the mean square error (MSE) to determine 
network fitness for the entire training set: 



n 


where T'™ is the real data and T“’‘’ is the calculated output value 
obtained from the normalized output (y,) of the network. This 
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Table 1: Parameters used in the hybrid ANN+PSO algorithm. 


ANN 


NN-type 

feed-forward 

Number of hidden layers 

1 

Transfer function 

tansig 

Number of iterations 

1500 

Normalization range 

[-1, 1] 

Weight range 

[-100, 100] 

Bias range 

[-10, 10] 

Minimun error 

le-3 

PSO 

Number of particles in swarm (Apart) 

50 

Number of iterations 

1500 

Cognitive component (ci) 

1.494 

Social component (C 2 ) 

1.494 

Maximum velocity (Vnan) 

12 

Minimum inertia weight (tUn,i„) 

0.5 

Maximum inertia weight (tUnjax) 

0.7 

Objective function 

RMSE 


process was repeated for the total number of patterns in the train¬ 
ing set. For a successful process the objective of the algorithm 
is to modernize all the weights minimizing the total root mean 
squared error (RMSE); 


RMSE = VMSE 

(7) 

e = min(RMSE). 

(8) 


In PSO, the inertial weight tu, the constant c\ and C 2 , the num¬ 
ber of particles and the maximum speed of particle sum¬ 
mary the parameters to synchronize for their application in a 
given problem. Then, an exhaustive trial-and-error procedure 
was applied for tuning the PSO-tANN parameters. Firstly, the 
effect of population Apart is analyzed for values of 25 to 100 indi¬ 
viduals in the swarm. For other applications, some authors have 
shown that a larger swarm increases the number of function eval¬ 
uations to converge to an error limit ED- In addition, Shi and 
Eberhart (32) illustrated that the population size has hardly any 
effect on the performance of a swarm algorithm. The top panel 
of Eigure[T] shows that the best population to solve the problem 
is of 50 individuals. Next, the effect of tu is analyzed for values 
of 0.1 to 0.9. The bottom panel of Eigure[T] shows the values of 
(jj that favoured the search of the particles and accelerated the 
convergence. This figure shows that for a linearly decreasing 
inertia weight starting at 0.7 and ending at 0.5, the PSO-(-ANN 
presents a good convergence. In other aspect, a usual choice for 
the acceleration coefficients Cj and C 2 is ci = C 2 (3T). The effect 
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Figure 1: Illustration of the behaviour of some parameters of the 
ANN+PSO against the number of iterations. The top and bottom panel 
con'espond to the number of particles in the swai'm {Npan) and the iner¬ 
tia weight (cli), respectively. 


of variation of constants was evaluated for the commonly used 
values of ci and C 2 such as 1.49, and 2.00 (3l]|32l. Eor this 
analysis, ci = C 2 = 1.49 presents a better convergence than other 
values. Table 1 shows the selected parameters for this hybrid 
algorithm. 

The step-to-step approach of PSO+ANN can be summarized 
as: 

Step 1: Initialize the positions (weights and biases) and veloc¬ 
ities of a group of particles randomly. The particles represents 
the weight vectors of ANN, including biases. The dimension 
of the search space is therefore the total number of weights and 
biases. 

Step 2: The ANN is trained using the initial particles position 
in PSO. The learning error produced from ANN network can be 
treated as particles fitness value according to initial weight and 
bias. The current best fitness achieved by particle j is set as i/'*. 
The if/'', with best value is set as if/g and this value is stored. 

Step 3: Evaluate the desired optimization fitness function (Eq. 
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7) over a given data set. 

Step 4: Compare the evaluated fitness value of each particle 
{Fj) with its value. If Fj < ij/. then ij/j = is the coordinates 
corresponding to best particle so far. 

Step 5: The objective function value is calculated for new po¬ 
sitions of each particle. If a better position is achieved by an 
agent, i/^* value is replaced by the current value. As in Step 1, 
i/fj value is selected among i/r* values. If the new )f/g value is 
better than previous value, it is replaced by the current xj/g value 
and this value is stored, if Fj < ipg then ij/g = is the particle 
having the overall best fitness over all particles in the swarm. 
Step 6: The learning error at current epoch will be reduced by 
changing the particles position, which will update the weight 
and bias of the network. Change the velocity and location of 
the particle according to movement equations (Eqs. 1 and 2). 
The new sets of positions (weights and biases) are produced 
by adding the calculated velocity value to the current position 
value. Then, the new sets of positions are used to produce new 
learning error in ANN. 

Step 7: This process is repeated until the stopping conditions 
either minimum learning error or maximum number of iteration 
are met, then stop; otherwise Loop to Step 3 until convergence. 
Step 8: The optimum weight and biases for ANN model are 
obtained by PSO. Best training process is obtained for ANN. 

In our time series analysis, if the input noise level contribution 
is available, the RMSE in the training phase shall be computed 
as follow: 



where is the noise level of each i-element. Note that = 
CTiv for a white noise assumption. 

Henceforth, we refer as the standard ANN+PSO to the hybrid 
ANN-l-PSO defined above. 

2.1 The 5foc/taif/c ANNh-PSO 

Up to now, the standard ANN+PSO is not developed to carry out 
the error propagation of the input noise level contribution. Nev¬ 
ertheless, once the standard ANN+PSO has been executed and 
has provided the optimal topology, we can apply an additional 
method in order to compute uncertainty of the prediction. 

Note that once the topology is established (number of hidden 
layer, neurons in each hidden layer, transfer functions /*, and 
weights and biases {w'F and b'F)), the neural network acts as a 
function (called function ANN) whose output only depends on 
the input vector (see, Eq. 4). The idea is to generate simulations 
from the input data (d; = d{t)) via Gaussian random number 
generator in order to propagate the intrinsic data noise through 
Ihe function ANN. 


For each /-element of the input time series we generate k- 
simulations as: 

di,k - di + GRk{cr nj) (10) 

where the input noise level ctai,, is known. GR(anj) is a random 
number generator following a Gaussian distribution with mean 
zero and standard deviation equal to crjg.. 

Finally, and for the i-th element, each k input data dj k provides 
an output yi^k- These yty are used in the estimation of a new 
estimator of prediction (y,) and an error on the prediction (ctj) 
as follows: 


Si =< Ta > and cr^ =< yfj, . (11) 


3 Noiseless chaotic time series prediction 


We computed the chaotic time series from the Mackey-Glass 
time-delay differential system EISl, which is described as fol¬ 
low: 


dx 

dt 


= Px(t) + 


ax(t - t) 
l+x(t- t)!" 


( 12 ) 


where x (unitless) is the series in the time t, and t the time delay. 
Here, we assumed a = 0.2, /I = 0.1 and x(0) = 1.2. Note that, 
if T >17 the time series shows a chaotic behaviour I33II34I . The 
nominal Mackey-Glass time series is obtained from numerical 
integration by a fourth order Runge-Kutta method. This series 
was computed with a time sampling of 1 second. Thus, x{t) is 
derived for 0 < f < 4 with x{i) = 0 for t < 0, where 4 is the 
time horizon considered. 

Mackey-Glass chaotic time series with t = 17 is considered 
as the nominal case (without noise contribution). Here, 

we generate two thousand data points (4 = 2000). 

From this data set, the input is created as a vector us¬ 
ing d points of the time series spaced A apart, i.e., x(r) = 
[x(f), x(f + A), • • ■ , x(f + (d - 1)A)]. The output is generated 
with the value x{t + T). 

According to the standard analysis of the Mackey-Glass 
chaotic time series, we consider four non consecutive points in 
the chaotic time series in order to predict the short-term x(t + 6): 


x(t + 6) = F [x(0, x(t - 6), x(l - 12), x(l - 18)] (13) 

where this standard test assumes d = 4 and A = T = 6 (6l[33l. 

For this input, the first thousand data were used for learn¬ 
ing (training) while the others were used for the prediction val¬ 
idation (prediction). In the ANN+PSO implementation on the 
nominal case, the optimum value of Ahl found was six, i. e., the 
architecture is described as 4-6-1. 

Figure [2] present a comparison between recorded and pre¬ 
dicted values of the Mackey-Glass time series for the training 
and prediction phases. This figure shows that for training and 
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Figure 2: Performance of ANN+PSO method on the Mackey-Glass 
chaotic time series (noiseless). The top and bottom panel show the 
training and prediction performance for the short-term x{t + 6) analy¬ 
sis, respectively. The grey and blue lines correspond to the input (rin) 
and output (Xom) data. The red line with diamond draws the difference 
between the input and output data (in a factor of 10“^). 


validation phases, the nominal and reconstructed values are in 
total agreement. In fact, and for training, we computed a re- 


Table 2: Root mean squared error (RMSE) reported for different meth¬ 
ods in the Mackey-Glass chaotic time series analysis. 


Method 

RMSEj,(,+j) 

Linear model 1351 

0.5503 

Conjugate gradient ANN 

0.2296 

Product operator T-norm 1371 

0.0907 

Fuzzy system 1381 

0.0816 

Cascade correlation NN 1391 

0.0624 

Genetic algorithm and fuzzy system 1401 

0.0490 

Backpropagation NN 1351 

0.0262 

Linguistic Model (20 rules ) l4Tl 

0.0256 

K-Nearest Neighbor 143 

0.0194 

This work 

0.0138 


mainder average, < Xi„ - rfom >, of -1.4 x 10“^ and a remainder 
maximum, max{|Xin-XoutlK of 3.20x 10“^. Similar results are ob¬ 
tained for the prediction phase; with a maximum of 3.22 x 10“^ 
and an average of -1.5 x 10“'^. 


Table shows the RMSE (for short term prediction of 
Mackey-Glass chaotic time series) from different computa¬ 
tional methods obtained from literature, for example, the Back- 
propagation NN (351, the conjugate gradient ANN 1361 . the 
product operator T-norm (33, the fuzzy system 1381 . etc (see 
references in Table [3' In the ANN-t-PSO configuration used 
here, the RMSE = 0.014 indicates that the performance predic¬ 
tion is in good agreement with other methods. Clearly, the 
inclusion of the PSO approach allows us to improve methods 
based on ANN without PSO as, for example, the conjugate 
gradient ANN (RMSE = 0.229) and the back-propagation NN 
(RMSE = 0.026). 

3.1 Chaotic behaviour 

As the Mackey-Glass time series without noise, it is a known 
system, is possible to compare the ability of ANN-l-PSO method 
of reproducing its chaotic behavior. Eigure[3]shows a represen¬ 
tation of the chaotic attractor studied from Mackey-Glass time 
series. This Eigure shows that with r = 17 the system operates 
in a high-dimensional regime. The Mackey-Glass system is infi¬ 
nite dimensional (because it is a time-delay equation) and, thus, 
has an infinite number of Lyapunov exponents (T,) 1^ . The 
Lyapunov exponents of dynamical systems are one of a number 
of invariants that characterize the attractors of the system in a 
fundamental way 113. Table 3 shows a comparison of the first 
four largest Lyapunov exponents of the Mackey-Glass system 
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Figure 3: Chaotic attractor for the Mackey-Glass noiseless chaotic time 
series (t = 17). 


Table 3: Lyapunov exponents reported in Farmer o versus calculated 
for the ANN+PSO method. 


4, 

4i,ANN+PSO 

0.00860 

0.00900 

0.00100 

0.00132 

-0.03950 

-0.04100 

-0.05050 

-0.05000 


reported in oil, with the Lyapunov exponents obtained for the 
ANN+PSO method for t =17. 


An approach to determining an appropriate cutoff value for 
the number of exponents can be related to the Lyapunov dimen¬ 
sion iH. This idea was originally explored by Kaplan and York 
(H. Thus, Kaplan and York conjecture that this dimension 
(Dky) is equal to the information dimension 1145 1 . In our case 
Dky is compute as 2.10. Note that in Farmer 01, authors re¬ 
ported a fractal dimension Dp = 2.13, and a Lyapunov dimension 
calculated by the Kaplan and York conjecture of Dky = 2.10. 

4 Noisy chaotic time series prediction 

In the previous section, the ANN+PSO has proven to be an ef- 
hcient method to the prediction of chaotic time series. Nev¬ 
ertheless, up to now effects of noise on the hybrid ANN+PSO 
implementation have not been studied. 

In order to study the impact of noise on chaotic series time 
prediction, we constructed the noisy time series as the contri¬ 



Time 


Figure 4: Mackey-Glass chaotic time series considered in this work 
(r = 17). The black solid line shows the noiseless case (nominal case). 
The green, blue and red lines correspond to the Mackey-Glass noisy 
time series with a white noise level (ctai) contribution of 0.01, 0.04 and 
0.1, respectively. 


bution of a noise level on the nominal case without noise. The 
Mackey-Glass noisy chaotic time series, x, = x(f), is generated 
as: 

X, = + rji (14) 

where 77 ; is the particular contribution of noise on the (-element. 
It is estimated as 771 = GR{crni), with GR(crf^j) a Gaussian ran¬ 
dom number generator. 

Note that corresponds to the noise level considered. Here, 
we assume that the original data are effected by a white noise, 
i. e., the noise level is the same in each (-element, cr^.i = ctaiQ. 
Different white noise levels are considered: = 0.01, cr/v = 

0.04, CTjv = 0.06, an = 0.08 and an = 0.1. These values are 
nearly related with the 1 %, 4 %, 6 %, 9 % and 11 % of the pick- 
to-pick amplitude of nominal case (~ Fig¬ 

ure!?] shows the noisy chaotic time series for an equal to 0.01 
(green), 0.04 (blue) and 0.1 (red). As expected, the noisy time 
series with an =0.01 is the closest to the nominal case. How¬ 
ever, the cases with an =0.04 and an =0.1 show a slightly 
more modified shape from the noiseless case, in particular with 
an = 0 . 1 . 


* In order to clarify, although the noise level cry is the same in each time the 
noise contribution 77, is not the same (the latter depend on the Gaussian random 
number generator). 
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Figure 5: Impact of the noise on the architecture. 


4.1 Noise effect on ANN+PSO 

The standard ANN+PSO is applied on our noisy time series, 
which provides the optimum topology and the y, prediction. 
Then, the stochastic ANN+PSO is run in order to obtain a 
new prediction estimator y, and the uncertainty of the prediction 
(O’?,)- 

Impact on architecture. For each noisy time series, and in the 
standard ANN+PSO implementation, we carry out a detailed 
study of the architecture characterization. In the determination 
of the optimum Ahl, the RMSE is computed for different num¬ 
ber of neurons in the hidden layer (from two up to thirty), which 
are presented in Figure For each series, the optimum Ahl is 
obtained when the RMSE reachs a minimum. As expected, the 
characterization of the architecture is strongly related with the 
noise level in the input data. In lower noise (as 0.01) the op¬ 
timum Ahl is clearly identified from Eigure|3 in contrast, in 
the most contaminated case {<tn = 0.1) the selection depends on 
fourth decimal of the RMSE (0.1292, 0.1291 and 0.1293 for 19, 
20 and 21 neurons in the hidden layer, respectively). The RMSE 
and the Ahl optimum are presented in Table 4. Using these val¬ 
ues, and according to the trend seen in Figure 0 we fit a lineal 
model, which provides a correlation with a slope of 0.0085. Al¬ 
though the Ahl for ctjv = 0.08 is not well characterized for this 
model, we can find a clear lineal correlation between the RMSE 
and the Ahl for different noise levels. In this context, and as il¬ 
lustration, in the overplot (in top-right side of Figure^ we show 
the relation of the Ahl and the noise level, whose the best lineal 
ht model is Ahl = 146 crjv+ 4.7. Therefore the impact of noise on 
the architecture of this hybrid neural network, for contributions 


Table 4: Parameters used in the evaluation of the prediction performance 
of the Standard and Stochastic ANN+PSO approach. 



A'hl 

RMSE 


Noiseless 

6 

0.0138 

1 

CTsi = 0.01 

6 

0.016 

1.2 

CTjv = 0-04 

11 

0.054 

3.9 

CTj^ = 0.06 

14 

0.078 

5.7 

CTs/ = 0.08 

15 

0.103 

7.5 

II 

p 

20 

0.129 

9.4 



time 


Figure 6: Predictions of Mackey-Glass noisy chaotic time series with a 
white noise contribution of CTjv = 0.1. The grey solid line correspond 
to the original Mackey-Glass noisy chaotic time series. The red and 
blue lines identified the results from standard ANN+PSO and stochastic 
ANN+PSO, respectively. The upper panel draw the y,- and y, predictions, 
and the lower panel the residual contribution (+in-Xout) of both methods. 


lower than 0.1, can be characterized by a lineal correlations of 
the RMSE with the Ahl, and the Ahl with the input noise cr/./. 

The prediction performance. As illustration, the predictions 
obtained for noisy case = 0.1, from the standard ANN+PSO 
(y,) and the stochastic ANN+PSO (y,) procedures, are presented 
in Eigurel^ As expected, even on this high noise level case, the 
y, and y, predictions are in total agreement. Actually, the RMSE 
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Figure 7: Impact of the noise on the performance prediction. 


obtained from both methods is same (in the approximation of the 
third decimal) for each noisy case. For this reason, the RMSE 
shown in Table 4 represent the RMSE of both methods. 

On the other hand, and as expected, the RMSE increases as 
growing the noise level (see Figure |7]l. For example, we ob¬ 
tained RMSE of 0.0138 and 0.13 for the noiseless and noisy 
(with (Tn = 0.1) cases, respectively. From Figure [T] we observe 
a linear correlation between the RMSE and the input noise level. 
The best fit model, without considering the RMSE of the noise¬ 
less case, corresponds to RMSE = 1.3 ctai, which shows a strong 
lineal correlation. Therefore, we confirm that a higher noise 
level in input data leads to a poor estimation of the prediction 
estimator, which is related linearly with the input noise level. 

Also, the ratio ^ = RMSEnoisy/RMSEnoiseiess (third column in 
Table|4]l can he used to study the impact of noise on the perfor¬ 
mance efficiency of our implementation (with respect to nomi¬ 
nal case). The hottom-right panel of Eigure|7] shows the perfor¬ 
mance efficiency against the input noise level. In the worst case, 
the performance efficiency (^) is strongly affected by one order 
of magnitude with respect to noiseless case. Even so, the stan¬ 
dard and stochastic ANN+PSO confirm to be a powerful tools 
for making predictions of chaotic time series. 

In the literature, we do not find a similar implemenfafion (due 
fo the ahead prediction, type and level of noise, etc.) that al¬ 
lows us a straightforward comparison of results. Eor example, 
we can contrast our results with those presented by Sheng et al. 
2012 (m. They applied the Echo State Network based on dual 
estimation (ESN) on a noisy Mackey-Glass time series (with 
a sampling of 2 second) with a white noise level of cr = 0.1. 
However, the ahead prediction was one, which is considerable 
lower than ours. Yet let us carry out a plain comparison. De- 
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Figure 8: Predictions and uncertainties from the stochastic ANN+PSO 
for the Mackey-Glass chaotic time series. This corresponds to case 
with a white noise of crjv = 0.1. The grey solid line draws the original 
Mackey-Glass noisy chaotic time series. The blue points with error bars 
correspond to the yi prediction and their uncertainties crj,. For optimal 
display of the uncertainties, these are presented in the low panel. 


pend on the prediction performance, they obtained RMSE of 
0.05 for Generic ESN (hereafter GESN) and 0.04 for CKF/KF 
BASED ESN (henceforth CESN). In this context, the impact of 
the noise on the performance efficiency is lower in ANN+PSO 
implementation (with respect to the ESN). In fact, we have a 
performance efficiency ^ of 9.4 while they obtained ^ of 1161 
and 33.5 for GESN and CESN, respectively. 

Prediction uncertainties. One of a main goals of this work 
is to estimate the uncertainty on the prediction. The predic¬ 
tion measurement (y,) and the error bars (cTy.) obtained from the 
stochastic ANN+PSO, for the noisy time series with - 0.1, 
are presented in Figure [8] We conhrm that our forecast and the 
input data, for the strong noise contribution, are in agreement at 
one-sigma (at 68.5% of confidential level) when the error bars 
are considered. The uncertainties obtained are presented in the 
low panel of Figure [8] We found a maximum and minimum 
uncertainty of 0.024 and 0.13, respectively, with a average of 
< CTj,, >= 0.07. This value is lower than the input noise level 
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(< cr^. > /an = 0.7), and this show impact of the error prop¬ 
agation in our methods. According to Figure 0 a relationship 
between the uncertainties and the times is not appreciated. 

Finally, and from Figures and [S] we have proven that the 
ANN-tPSO (with the standard and/or the stochastic implemen¬ 
tation) is a robust tool in the predictability (for the short-term 
prediction) of time series alfected by a white noise. In addition, 
now the ANN-l-PSO method can provide, for first time, an esti¬ 
mation of the uncertainty of the prediction. 


5 Conclusions 

In this paper, a hybrid algorithm based on artificial neural net¬ 
work and particle swarm optimization (ANN-t-PSO) is used in 
the short-term x(t -l- 6) prediction of Mackey-Glass chaotic time 
series. In addition, an study of the impact of the noise on our 
hybrid method is presented. Based on the results and discussion 
presented in this study, we have the following conclusions: 

• The current value x{t) and the past values used have influential 
effects on the good training and predicting capabilities of the 
chosen network. 

• In noiseless case, simulation shows that this hybrid 
ANN-tPSO algorithm is a very powerful tool for making pre¬ 
diction of chaotic time series, and the low deviations found with 
the proposed method show an accuracy comparable with other 
methods available in the literature. 

• In noisy cases, we have proven that the hybrid ANN-l-PSO is 
a robust tool in the predictability of the short-term prediction of 
chaotic time series alfected by a white noise. 

• The impact of the noise on the topology and performance 
efficient of the ANN-l-PSO is important. However, this study 
shows that the error propagation through the ANN-l-PSO have a 
linear behaviour, which generates a linear relationship between 
the RMSE (optimization parameter) and the input noise level. 
Therefore, the PSO optimization provides a linearity which en¬ 
sures that the neural network will converge to an appropriate 
solution, even if a noise level contribution is present. 

• For noisy cases, although a straightforward comparison with 
literature is unavailable. The performance efficient ^ proves that 
the standard/stochastic ANN-l-PSO implementation is affected 
in a lesser degree than other similar performance. 
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